Finding Parts in Very Large Corpora
نویسندگان
چکیده
We present a method for extracting parts of objects from wholes (e.g. "speedometer" from "car"). Given a very large corpus our method finds part words with 55% accuracy for the top 50 words as ranked by the system. The part list could be scanned by an end-user and added to an existing ontology (such as WordNet), or used as a part of a rough semantic lexicon.
منابع مشابه
Evaluating Two Annotated Corpora of Hindi Using a Verb Class Identifier
[In the past few years, Indian languages have seen a welcome arrival of large parts of speech annotated corpora, thanks to the DIT funded projects across the country. A major corpus of 50,000 sentences in each of the 12 of major Indian languages is available for research purposes. This corpus has been annotated for parts of speech using the BIS annotation guideline. However, it remains to be se...
متن کاملSearching Parallel Corpora for Contextually Equivalent Terms
In this paper, we show how a large bilingual English-French parallel corpus can be brought to bear in terminology search. First, we demonstrate that the coverage of available corpora has become substantially more extensive than that of mainstream term banks. One potential drawback in searching large unstructured corpora is that large numbers of search results may need to be examined before find...
متن کاملFast Syntactic Searching in Very Large Corpora for Many Languages
For many linguistic investigations, the first step is to find examples. In the 21st century, they should all be found, not invented. Thus linguists need flexible tools for finding even quite rare phenomena. To support linguists well, they need to be fast even where corpora are very large and queries are complex. We present extensions to the CQL ’Corpus Query Language’ for intuitive creation of ...
متن کاملSyllable detection in read and spontaneous speech
Automatic syllable detection is an important task when analysing very large speech corpora in order to answer questions concerning prosody, rhythm, speech rate, speech recognition and synthesis. In this paper a new method for automatic detection of syllable nuclei is presented. Two large spoken language corpora (PhonDatII, Verbmobil) were labelled by three phoneticians and then used to adjust t...
متن کاملExtraction of Translation Equivalents from Parallel Corpora Using Sense-sensitive Contexts
The paper proposes an unsupervised method to extract translation equivalents from parallel corpora. The strategy we use takes into account the context of words. Given a word of the source language and a particular context, we learn its word translation within an equivalent context. We first extract pairs of similar contexts and, then, we compare the similarity between words appearing in each pa...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999